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ABSTRACT 

Cross-correlation techniques provide a promising avenue for calibrating photometric redshifts and deter- 
mining redshift distributions using spectroscopy which is systematically incomplete (e.g., current deep spec- 
troscopic surveys fail to obtain secure redshifts for 30-50% or more of the galaxies targeted). In this paper 
we improve on the redshift distribution reconstruction methods from our previous work by incorporating full 
covariance information into our correlation function fits. Correlation function measurements are strongly co- 
variant between angular or spatial bins, and accounting for this in fitting can yield substantial reduction in 
errors. However, frequently the covariance matrices used in these calculations are determined from a relatively 
small set (dozens rather than hundreds) of subsamples or mock catalogs, resulting in noisy covariance matrices 
whose inversion is ill-conditioned and numerically unstable. We present here a method of conditioning the 
covariance matrix known as ridge regression which results in a more well behaved inversion than other tech- 
niques common in large-scale structure studies. We demonstrate that ridge regression significantly improves 
the determination of correlation function parameters. We then apply these improved techniques to the problem 
of reconstructing redshift distributions. By incorporating full covariance information, applying ridge regres- 
sion, and changing the weighting of fields in obtaining average correlation functions, we obtain reductions in 
the mean redshift distribution reconstruction error of as much as ^ 40% compared to previous methods. In an 
appendix, we provide a description of POWERFIT, an IDL code for performing power-law fits to correlation 
functions with ridge regression conditioning that we are making publicly available. 

Subject headings: galaxies: distances and redshifts — large-scale structure of the universe — surveys — cos- 
mology: observations 



1. EVITRODUCTION 

Many of the cosmological measurements to be performed 
with future photometric surveys will require extremely 
well-characterized redshift di stributions ( lAlbrecht et al.l2006l : 
iHuterer et aT]|2006l: iMa et al.ll2006,) . A key challenge will be 
to obtain redshift information for galaxies from large multi- 
wavelength samples, which include objects that are too faint 
for spectroscopy or fail to yield secure redshifts. A conven- 
tional approach is to use large sets of spectroscopic redshifts 
to establish a relation between redshi ft and colo r informa- 
tion to calibrate photometric redshifts jConnollv e t al. 1995]; 
iLima et alJl2"008HBudavaril2009HFreeman et alj|2009i) . How- 
ever, at the depths probed by these surveys, existing large red- 
shift samples have all been highly incomplete, with a strong 
dependence of success rate on both redshift and galaxy prop- 
e rties ( Cooper et al. 2006). 

iNewmaa (,2008i) described a new technique for calibrat- 
ing photometric redshifts (commonly referred to as photo-z's) 
using cross-correlations which exploits the fact that galax- 
ies at s imilar redshifts t end to cluster with each other, and 
in Ma tthews & NewmanI (2010) (from here on referred to as 
MNIO) we tested this technique using realistic mock catalogs 
which include the impact of bias evolution and cosmic vari- 
ance. We showed that for objects in a photometric redshift 
bin (e.g., selected using some photo-z-based algorithm), we 
can recover its true redshift distribution, (ppiz), by measur- 
ing the two-point angular cross-correlation between objects in 
that bin with a bright spectroscopic sample in the same region 
of the sky, as a function of spectroscopic z. The cross-corre- 
lation signal at a given redshift will depend on both the intrin- 
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sic clustering of the samples with each other and the fraction 
of the photometric sample that lies at that redshift (i.e., 0p). 
Autocorrelation measurements give information about the in- 
trinsic clustering of each sample and can be used to break this 
degeneracy. 

In MNIO, we assumed for convenience that correlation 
function measurements in different angular/radial bins were 
completely independent. However, analytical models as well 
as simulations have shown that the covariance between bins is 
significant (Bernstein 1994; Zehavi et al. 2005; Crocce et al] 
l20i il) . Incorporating all available information about this co- 
variance should provide better constraints on the correlation 
function parameters used in reconstructing 0p(z). In this pa- 
per we improve on the methods of MNIO by accounting for 
this covariance. 

However, the inversion of covariance matrices calculated 
from relatively small sample sizes (e.g. a modest number of 
mock catalogs or jackknife regions) is not well behaved: mod- 
est noise in a covariance matrix can yield large variations in 
its inverse. We therefore also incorporate ridge regression, 
a method of conditioning covariance matrices (i.e., stabiliz- 
ing the calculation of their inverse) which is common in the 
statistics literature but novel to correlation function analyses, 
into our methods. We will then optimize the reconstruction of 
<t>piz) by varying the level of this conditioning. Throughout 
this paper we assume a flat ACDM cosmology with il„,=0.3, 
riA=0.7, and Hubble parameter //q = lOO/z km s"' Mpc"'; we 
have assumed /i=0.72, matching the Millennium simulations 
used for this work, where it is not explicitly included in for- 
mulae. 

The structure of this paper is as follows. In ^we describe 
the catalogs and data sets used. In ^we provide a summary 
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of the reconstruction technique used in [Matthews & NewmanI 
(120101) . as well as a description of the changes implemented 
for this paper. In ^we detail the optimization of the recon- 
struction achieved using a risk analysis. We summarize the 
results and discuss the gains from ridge regression in ^ Fi- 
nally, we include an appendix describing the public release of 
POWERFIT, an IDL program which performs fits for power- 
law correlation function parameters incorporating covariance 
information and conditioning via either ridge regression or 
singular value trimming. 

2. DATA SETS 

To test this method, it is necessary to construct two sam- 
ples of galaxies, one with known redshift ("spectroscopic") 
and the other unknown ("photometric"). We have done this 
using mock DEEP2 Redshift Survey light cones produced by 
Darren Croton. A total of 24 light cones were constructed by 
taking lines-of-sight through the Millennium Simulation halo 
catalog (Lemson & Virgo Consortiu m 2006) with the redshift 
of the sim ulation cube used increasin g with distance from the 
observer (iKitzbichler & Whit3l2007h . The light cones were 
then populated with galaxies using a semi-analytic model 
whose par ameters were chos en to reproduce local galaxy 
properties dCroton et al.l 12006). Each light cone covers the 
range 0. 10 < z < 1 .5 and corresponds to a 0.5 x 2.0 degree re- 
gion of sky, roughly equivalent to the area of a single DEEP2 
survey field. As a consequence, we will commonly refer to 
the individual light cones as "fields" in the remainder of this 
paper. 

The spectroscopic sample is generated by randomly select- 
ing 60% of objects with observed /?-band magnitude R < 
24.1, resembling the selection probability and depth of the 
DEEP2 Galaxy Redshift survey. The mean number of spec- 
troscopic objects in a single light cone is 35,574. The other 
sample, referred to hereafter as the photometric sample, is 
constructed by selecting objects in the mock catalog down 
to the faintest magnitudes available, with the probability of 
inclusion a Gaussian with (z) = 0.75 and a. = 0.20. This emu- 
lates choosing a set of objects which have been placed in a sin- 
gle photometric redshift bin by some algorithm with Gaussian 
errors. Although the selection function used is a Gaussian, the 
net z distribution of the photometric sample will not be a pure 
Gaussian since the redshift distribution of the mock catalog 
we select from is not uniform. After applying this Gaussian 
selection to the mock catalog, we randomly reject half of the 
selected objects in order to reduce calculation time. The mean 
number of objects in the final photometric sample over the 24 
light cones is 44,053. 

The mock catalog includes both the cosmological redshift 
as well as the observed redshift for each object. The observed 
redshift in corporates the effects of redshift-space distortions 
(lHamiltonl [l998). and is the redshift value used for objects in 
the spectroscopic sample. When plotting the redshift distribu- 
tion of the photometric sample we use the cosmological red- 
shifts for each object (differences are small). Throughout the 
remainder of this paper, quantities related to the spectroscopic 
sample, i.e. objects with known redshifts, will be labeled with 
the subscript 'i', while those related to the photometric sam- 
ple will be labelled with subscript 'p'. 

3. METHODS 

3.1. Correlation Functions 

The basic quantities we will use to determine redshift distri- 
butions are the real space two-point correlation function and 



the angular two-point correlation function. The real space 
two-point correlation function ^(r) is a measure of the excess 
probability dP (above that for a random distribution) of find- 
ing a galaxy in a volume dV, at a separation r from another 
galaxy, dP = n[l + £,{r)]dV (Peebles 1980), where n is the 
mean number density of the sample. The angular two-point 
correlation function w{9) is a measure of the excess probabil- 
ity dP of finding a galaxy in a solid angle dil, at a sep aration 9 
on the sky from another galaxy, dP = + w(9)]dfl jPeeblesI 
11980^, where S is the mean number of galaxies per steradian. 

To calculate correlation functions we bin the distance be- 
tween object pairs in a given catalog to get the pair counts 
as a function of separation (either real-space separation for 
^(r), or 9 separation on the sky for w{9)). We similarly count 
the number of pairs in each bin between objects in a partic- 
ular catalog and those in a randomly-distributed catalog, as 
well as the number of pairs between two random catalogs; we 
then calculate cor relation functions by applying the Landy & 
Szalay estimator dLandv & Szalavl [l993). The random cata- 
logs are constructed to have the same shape on the sky (and 
in the case of ^(r), the same redshift distribution) as the cor- 
responding data catalog, but contain objects distributed with 
a uniform distribution on the sky (constant number of objects 
per solid angle, taking spherical geometry into account). In 
all cases we use a random catalog with ^10 times the number 
of objects in the corresponding data catalog. More details of 
our correlation function measurement methods are provided 
in MNIO. 

Throughout this paper we will model ^(r) as a power law, 
^(r) = (r/ro )""'', which is an adequate assumption (within 
the errors for the samples considered here) from ^ 0.5 to 
~ 2Qh~^ comoving Mpc for both observed samp les to z ~ 
1.5 a n d tho s e in the Millennium mock catalogs dCoil et alj 
120061 120081: [Matthews & Newman' '2010); higher-precision 
mea surements w ould likely benefit from a halo- model anal- 
ysis (Zheng & Weinberg 2007; ZhenJetaDIlQOl. The angu- 
lar correlation function can be related to the spatial correlation 
function: if £( r) = (r/ ro)"^, then w(9) = A9^~'', where A oc 
(Peebles"1980'). 

However, since the observed mean galaxy density in a field 
is not necessarily representative of the universal mean density, 
measurements of 1^(6*) from a particular field will in general 
deviate from the global mean by an additive offset known as 
the integral constraint. To remove this effect, we fit w{9) using 
a power law minus a constant, e.g. we consider models of the 
form w(9)=A9^~'^-C, where C is the integral constraint. The 
specific correlation measurements that we use in the calcula- 
tion of 4>piz) are the autocorrelation of the spectroscopic sam- 
ple as a function of separation and redshift, £_ss{r,zy, the angu- 
lar autocorrelation of the photometric sample, Wpp{9); and the 
angular cross-correlation of the two samples with each other 
as a function of spectroscopic redshift, Wsp{9,z)- As we will 
describe below, the redshift distribution of the photometric 
sample can be determined using the fit parameters (in partic- 
ular ro, 7, and A) of each of these three correlation functions. 



3.2. Reconstructing 4>p(z) 

Modeling ^(r) as a power law, we can determine the 
relationship between the angular cross-correlation function 
Wsa(9, z) and the redshi ft distribution. Following the deriva- 
tion m 'NewmanI (120081) (cf. eq. 4), the cross-correlation be- 
tween the photometric sample and spectroscopic objects at 
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redshift z can be written as 



hi^)H{lsp)rl:^e'-'<"'D{z)' 
dl/dz 



(1) 



where //(7) = r(l/2)r((7- l)/2)/r(7/2) (here T{x) is the 
standard Gamma function) and (t>p{z) is the probabihty distri- 
bution function of the redshift of an object in the photometric 
sample. The angular size distance, Diz), and the comoving 
distance to redshift z, l{z), are determined from the basic cos- 
mology. We can see that since Wsp{0,z) ^ 4>p{z) '"o j'^, there is 
a degeneracy in the cross-correlation signal between the red- 
shift distribution and the intrinsic clustering of the two sam- 
ples with each other (characterized by ro.sp and 7^^). We can 
estimate these cross-correlation parameters from the autocor- 
relation measurements of each sample using the assumption 
of linear biasing, for which = {S,ssS,pp)^^^ ■ 

We now outline the details of the particular procedures 
which gave the best reconstruction of 0p(z) in MNIO. First, 
we need to calculate the autocorrelation parameters of each 
sample. To determine how ^j, evolves with redshift we bin 
the spectroscopic objects in redshift and measure the two- 
point correlation function in each z-bin. We calculate us- 
ing the as-observed redshifts from the simulation, which are 
affec ted by redshift space distortions in the line-of-sight direc- 
tion (iHamil ton 1998). To minimize this effect it is common to 
calculate ^ as a function of the projected separation, r^, and 
the line-of-sight separation, tt, and then integrate along the 
line-of-sight to obtain the projected correlation function. 
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where H{j) is defined following equation [T| By measuring 
Wp(rp) in multiple z-bins and fitting with equation [3] we can 
determine ro sAz) and jssiz)- We found that employing a lin- 
ear fit of ro.jj and 7^, as a function of z resulted in a better 
recovery of (t)p{z) than using each bin's value directly. We 
then calculate the angular autocorrelation of the photometric 
sample, Wpp{9), and fit to obtain the parameters App and jpp. 
We use the autocorrelation parameters along with an initial 
guess of ro^pp to calculate rjj^ = (rjl/o^y 'Z^. We found that 
the best results were obtained by assuming the redshift de- 
pendence of the scale length is similar for each sample, i.e. 
nippiz) oc ro,,,(z), with an initial guess of ro^ppiz) = ro^ssiz). 

For the angular cross-correlation between the two samples, 
Wsp{0,z), we bin the spectroscopic sample into small bins in 
redshift and, in each bin, measure the cross-correlation be- 
tween objects in that z-bin with all objects in the photometric 
sample. We can then fit to obtain the parameters Asp, ^sp, and 
Csp- However, we found a significant degeneracy between 
these parameters when fitting. To remove this degeneracy, we 
fix 7sp = (7.SJ + ^pp) /2 in each z-bin, and only fit for the am- 
plitude and integral constraint. We choose this estimate for 
7jp because the clustering of the samples with each other is 
expected to be intermediate to the intrinsic clustering of each 
sample. Using the resulting values of A^p and 7,,,, as well 
as the initial guess for J^, we obtain an initial guess of the 
redshift distribution (ppiz) using equation[T| Using this (ppiz), 
along with App and 7„„, we can redetermine r^ pp using Lim- 
ber's equation (.Peebles., 1980.) . which we use to redetermine 



^o'sp ^^^^ 4'piz)- This process is repeated until conver- 
gence is reached. A more detailed description of this tech- 
nique, including an error analysis for the resulting reconstruc- 
tions, can be found in MNIO. 

We have implemented an additional step in the reconstruc- 
tion of (ppiz) for this paper that was not employed by MNIO. 
For each measurement, after fixing jsp and fitting for Asp and 
Csp in each z-bin, we performed a smooth fit to the measured 
values of Csp{z) as a function of redshift. Using the same jsp 
but fixing Csp at the predicted values for each bin, we then 
fit for Asp. We obtained the best results from a Gaussian fit 
to Csp, although simply smoothing the measured Cspiz) val- 
ues with a boxcar average also resulted in significant gains in 
reconstruction accuracy. We initially tested these techniques 
for MNIO, but they did not improve the reconstruction, and 
in some z-bins made the reconstruction worse. However, after 
incorporating covariance information into our analyses, this 
additional step significantly reduced errors in the reconstruc- 
tion of 4>p(z), likely because the determination of Csp for each 
redshift bin is now more accurate. 

We have also made a change in the methods used to cal- 
culate average correlation measurements from multiple light 
cones. In MNIO this was done by summing the pair counts 
over all of the fields and using the total pair counts in the 
Landy & Szalay estimator. However, in the course of this 
paper we found that this method overestimates the mean cor- 
relation by more heavily weighting those light cones which 
are overdense at a particular redshift: they will both contain 
more pairs and, generally, exhibit stronger clustering than a 
randomly-selected region of the universe. For this paper, we 
instead determine the average correlation by calculating the 
correlation function in each field individually and then per- 
forming an unweighted average of those measurements. This 
change had little effect on the autocorrelation function of the 
photometric sample, Wpp{9), mainly because the larger vol- 
ume sampled meant that the density varies less from field to 
field. The projected autocorrelation of the spectroscopic sam- 
ple, Wp(rp), and the cross-correlation measurements, Wsp{6,z), 
were significantly affected by this change, however, with av- 
erage decreases in the correlation strength of ^ 10-20%. 

3.3. Fitting Parameters Using Full Covariance Information 

In MNIO we fit for the various correlation function param- 
eters (ro.jj, 7js, etc.) assuming that there is no covariance 
between measurements in different angular/r^ bins. We de- 
termined best-fit parameters by performing a minimiza- 
tion where the errors used were given by the standard devia- 
tion of the correlation function measurements in each of the 
24 mock light-cones; i.e. the fitting assumed that the rele- 
vant covariance matrices were all diagonal. However, ana- 
lytical models as well as simulations have shown that the off- 
diagonal element s of the co variance matrix are non-negl igible 
(Bernstein 1994: IZehavi et al. 2005; ICrocce et al.| |20I l'). We 
have confirmed this to be the case by calculating the full co- 
variance matrices of correlation function measurements in the 
24 fields. Therefore, in MNIO we were not exploiting the full 
covariance information when fitting for the correlation func- 
tion parameters. By incorporating this information into our 
fitting process, we should expect to obtain more accurate re- 
sults. 

In order to calculate the parameters using the full covari- 
ance matrix we used minimization as in MNIO, but in this 
case we calculate x' values taking into account the covari- 
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Fig. 1. — An example of fitting a power law-integral constraint model 
to a measurement of the angular autocoiTelation of the photometric sam- 
ple, Wpp{8), from Millennium catalog mock light cones. The solid line is 
a fit assuming no covariance between angular bins, while the dashed line 
is a fit using the full covariance matrix, where both are fit over the range 
0.001° <e< 1.584°. 
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where C is the covariance matrix, y is the observed correla- 
tion function data in each bin, and y is the expected value 
according to a given model. As an example, for w{9) equation 
Hlbecomes: 
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We start by minimizing equation|5]for the case of fixed 7. In 
that case, this minimization is simply linear regression where 
9^~'' is the independent variable, and A and -C are the stan- 
dard 'slope' and 'intercept'. Minimizing analytically to 
obtain the parameters for a linear fit is straightforward; thus 
for fixed 7 we can readily determine the best-fit A and C via 
standard formulae. Alternatively, to fit for all three param- 
eters simultaneously we can repeat the linear fit process for 
different values of 7, and then determine the value of 7 which 
minimizes x^- We use this fitting method to determine the 
parameters of the angular autocorrelation of the photometric 
sample, Wpp{9), and of each z-bin of the angular cross-corre- 
lation, Wsp(9,z). For the projected real-space autocorrelation 
function, we see from equation [3] that Wp{rp) ^ r^~'' (i.e. the 
same as the relation between w(9) and 6), so the fitting method 
is the same except that we force the intercept to be equal to 
zero and only fit for 7 and A. We then find ro using the con- 
version A = r^^Hi'j) from equation[3] Figure[T]compares the fit 
assuming no covariance for one measurement of WppiO) from 
the simulation (averaging Wpp from 4 of the 24 mock fields) 
to a fit using the full covariance matrix. 

The covariance matrices we use for fitting are calculated us- 
ing correlation measurements from the 24 mock light-cones, 
and is therefore a sample covariance matrix and not the 'true', 
underlying C. It can be shown that while the sample covari- 



ance matrix is an unbiased estimator of C, the inverse of the 
sample covariance matrix is in fact a biased estimator fo r 
the inverse of the true covariance matrix jHartlap et alj|2007h . 
The amount of bias depends on the size of the sample used to 
calculate the covariance matrix; in our case, this is the num- 
ber of mock catalogs (24). However, this bias can be corrected 
for (assuming Gaussian statistics and statistically independent 
measurements) simply by rescaling the inverse sample covari- 
ance matrix by a constant factor; this will not, therefore, affect 
the location of any minimum. We apply a bias correction 
where relevant in our analysis below. 

3.3.1. Conditioning the Covariance Matrix 

Since we are using a covariance matrix calculated from a 
modest number of light cones-in effect a 'measured' covari- 
ance matrix with only a limited number of samples-noise 
and numerical instabilities cause difficulties when calculat- 
ing C~'. We found the inversion of C to be much more 
well behaved when using coarser bins in 9 and rp than em- 
ployed in MNIO. For both Wp{rp) and w{9) we doubled the 
bin size in log space, i.e. we use bins with Alog(rp) = 0.2 and 
Alog(6') = 0.2. Increasing the bin size further did not yield 
significant improvements. 

To reduce the impact of noise in our measured covari- 
ance matrix further, we investigated several methods of con- 
ditioning the matrix (i.e., modifying the covariance matrix 
to improve the robustness of its inversion), and looked at 
how varying the conditioning improved the reconstruction. 
One commonly-applied method involves performing a sin- 
gular value decomposition (SVD) of the covariance matrix 
and setting the singular val ues below som e threshold (and 
their inverse) equal to zero (I.Tacksonlll972l: iWiggins 1972). 
This is equivalent to performing an eigenmode analysis and 
trimming any unreso lved modes, as is done, for instance, in 
iMcBride et all (l20Tlh . 

We also tried conditioning the covariance matrix us- 
ing a technique commonly known as ridge regression 
(iHoerl & Kennardl[T^7Q) . This involves adding a small value 
to all of the diagonal elements of the covariance matrix be- 
fore inverting, which reduces the impact of noise in the off- 
diagonal elements and makes the inversion more stable. We 
parameterized this conditioning by calculating the median of 
the diagonal elements of the covariance matrix and adding a 
fraction / of that median value to the diagonal. We obtained 
better results from ridge regression than from zeroing out sin- 
gular values (see 34.1l below). and it is therefore the primary 
method used throughout the rest of this paper 

At first glance it may seem that applying ridge regression 
to the covariance matrix should be detrimental to determin- 
ing the actual values of correlation function parameters: we 
are effectively assuming by fiat that the effective covariance 
matrix to be used in calculating differs from what was mea- 
sured. Since ridge regression yields larger values for the di- 
agonal elements of the covariance matrix than the data them- 
selves would suggest, the results are equivalent to a situation 
with larger nominal measurement uncertainties (and hence 
broader minima) than implied by the original covariance 
matrix. 

However, when C is determined from a limited set of mea- 
surements, C~' tends to differ significantly from the true in- 
verse. Hence, using the standard covariance matrix in fit- 
ting should lead to measurements with nominally tighter er- 
rors than ridge regression techniques, but those measurements 
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Fig. 2. — A test of the impact of the conditioning of the covariance matrix on the results from fitting the ampUtude of the correlation function, A. We plot the 
square root of the fractional median risk (solid line) and of the maximum risk (dashed line) on A as a function of the degree of conditioning. We define the risk 
as the total mean squared error; i.e., the variance plus bias squared. (Left panel) We condition using ridge regression; we add a fraction / of the median of the 
diagonal covariance matrix elements to all diagonal elements in order to stabilize the inversion of the covariance matrix. (Right panel) We condition by inverting 
using singular value decomposition (SVD), setting all singular values below some threshold to zero. The median values are from a single set of H)'* runs, but the 
maximum risk line is the mean of the results from 10 sets of 10^ runs, as the maximum risk varied significantly from ran to ran. Errors on the median are plotted, 
but are very small and not visible. The conditioning has a much larger effect on the maximum risk, and we therefore use a minimax optimization: i.e., choose the 
parameter values which make the maximum risk as small as possible. Using ridge regression, both the median and maximum optimized risk are smaller than for 
the SVD method. We therefore use ridge regression as our primary conditioning technique in this paper; the optimum results in fitting Wpp(6) are achieved for 
/~3%. 



may in fact be significantly offset from the true value of the 
parameter we are attempting to determine. This can cause the 
parameter results to have larger spread about the true value 
than optimal. When we add some degree of ridge regression, 
the inverse of the covariance inatrix is better behaved, and 
hence is less likely to yield a discrepant result. By varying the 
strength of the ridge regression conditioning, we can choose 
different tradeoffs between the bias and variance of parame- 
ter estimates. In general, we want both of these contributions 
to be small; in the next section we investigate what degree of 
conditioning minimizes their sum. 

4. RISK OPTIMIZATION 

In this section we will evaluate how the conditioning of 
the covariance matrix affects the determination of correla- 
tion function parameters and ultimately the reconstruction of 
(j)p(z)- By doing so, we will be able to optimize the reconstruc- 
tion of the true redshift distribution of the photometric sample. 
We assess this by measuring the integrated mean squared er- 
ror, i.e. the variance plus the bias squared. This is commonly 
referred to in statistics literature as the 'risk'. By focusing 
on the risk in some quantity we are optimizing for the mini- 
mum combined effect of variance and bias: either large ran- 
dom errors or large bias would lead to a large risk. We hence 
define the risk to be TZ(X) = ((X-Xtme)^), where X-Xtme is 
the difference between the measured parameter value and its 
true value . At times we wiU also refer to the fractional risk of 



a parameter, which we define as '^(X) = ((X-Xtrue)^)/X^g. 
Since we utiUze three different types of correlation measure- 
ments in the reconstruction of (f)p(z), we look at how changing 
the level of conditioning of the covariance matrix affects each 
one individually. 

4.1. Optimizing Fits Tow pp{6) 

We optimized the conditioning of the covariance matrix for 
the autocorrelation of the photometric sample using a Monte 
Carlo simulation where we use the covariance matrix of Wpp 
calculated from the 24 fields (i.e., the 24 different light cones) 
as our "true" covariance matrix, and then use it to generate 
realizations of correlated noise about a selected model. To 
do this we first find the eigenvalues and eigenvectors of the 
covariance matrix. We create uncorrelated Gaussian noise 
with variances equal to the eigenvalues, and then apply the 
transformation matrix constructed from the eigenvectors to 
this noise. This technique yields mock data with correlated 
noise corresponding exactly to the "true" covariance matrix 
(here, the covariance matrix of the 24 mock fields). For 
the true model we use A,rue = 4.0 x 10""*, ^true = 1-58, and 
Ctrue = 6.5 X 10"'', which are approximately the mean param- 
eters measured from the simulation. 

In MNIO we used the 24 mock light-cones to generate 10"* 
"measurements" by randomly selecting four fields at a time 
and finding the average w(6) for those fields. In order to 
simulate this we used the method for generating correlated 



6 



noise described above to create 24 realizations of single-field 
w{9) measurements, and then generated 10"* randomly se- 
lected 'pick-4 measurements' from those 24 realizations; we 
will refer to each set of 24 new realizations (and its derived 
products) as a 'run' below. For each run we use the set of 24 
realizations to calculate a measured covariance matrix, which 
will differ from the true covariance matrix used to generate the 
noise. The uncertainty in an estimate of the covariance ma- 
trix from the 24 realizations should be worse than the errors 
in realistic applications, making this treatment conservative. 
This is because the area covered by photometric surveys will 
in general be much larger than for the spectroscopic sample, 
which will result in a better constrained covariance matrix for 
the autocorrelation of the photometric sample; however, for 
the mock catalogs used here the spectroscopic and photomet- 
ric areas are identical. The resulting 'measured' covariance 
matrix for a given run is then used to fit for the parameters 
of a power-law fit in each of that run's pick-4 measurements 
by minimizing (cf. equation|4|. For this and all other cor- 
relation function fits described herein we used the IDL code 
POWERFIT, which is being publicly released as an accompa- 
niment to this paper 

We begin by evaluating how the reconstruction of the am- 
plitude, A, changes as we vary the conditioning. The integral 
constraint exhibits similar behavior to the amplitude since it 
is proportional to the correlation strength; we are in any event 
not as concerned with the behavior of C since it is essentially 
a nuisance parameter. For simplicity, we fix 7 at the true value 
for each run and only fit for A and C. We calculate the risk on 
A by performing 10"* runs, where for each run we: 

1 . Created 24 realizations of w(9) as described above 

2. Generated 10"* pick-4 measurements, randomly select- 
ing four realizations at a time from the 24 and calculat- 
ing their mean w(9) 

3. Fit each pick-4 measurement for A and C using the co- 
variance matrix calculated from the 24 realizations cre- 
ated in step 1 

4. Calculated the mean fractional risk on A over the 10^ 
pick-4 measurements, TZiA) = {{A- At^ue)^) I A^tme- 

We can perform the fits and calculate the fractional risk on 
A while applying varying levels of conditioning on the co- 
variance matrix. We parameterize the ridge regression condi- 
tioning using a variable /, which we define as the fraction of 
the median value amongst diagonal elements of the covari- 
ance matrix which is added to the diagonal elements; i.e., 
we replace the /, / element of the covariance matrix, C„, by 
C,7 + / X median(C,7). For comparison, we also calculate the 
fractional risk on A while varying the singular value threshold 
for the SVD conditioning described in ^3.3.11 where all sin- 
gular values below the threshold and their inverses are set to 
zero. 

Figure|2]shows the square root of the median and maximum 
fractional risk amongst the 10"* runs as a function of both / 
and the singular value threshold. In both cases we see that 
the conditioning has a much stronger effect on the maximum 
risk than it does on the median. We therefore perform a mini- 
max optimization; i.e., we choose the conditioning that mini- 
mizes the maximum risk. Looking at the level of conditioning 
corresponding to this minimax optimization for each method, 
we see that the median and maximum risk are both smaller 
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Fig. 3. — Contour plot showing the distribution of the median values of 
A— Af,„e and C — Cnue from each of lO** runs as described in ^4.11 where A 
and C are the fit parameters for \v(6) = AS'"'' -C. For our model we used 
A/riif = 4.0 X ICr"* and Qme = 6-5 X l(r^^. For each distribution we show the 
la and 2cr contours. The solid lines are the fit parameters when using the 
full covariance matrix with the optimized conditioning (/ = 3%). The dashed 
lines show the distribution resulting from fits with the same techniques as 
MNIO, where we assume no covariance and fit over a smaller 8 range. We are 
most concerned with errors in the amplitude; it is clear there is a significant 
improvement in the recovery of the actual value of A when the full covariance 
information is exploited. 

for the ridge regression conditioning. In addition, with the 
SVD method the maximum risk is much more sensitive to 
changes in the threshold around its optimized value. Small 
changes from the optimized threshold value in either direc- 
tion can have a significant effect on the maximum risk, while 
the maximum risk curve for the ridge regression method is 
relatively flat in the vicinity of the optimized value. We there- 
fore use ridge regression conditioning for the remainder of 
the calculations. By adding a few percent conditioning to our 
covariance matrix with the ridge regression method, we can 
significantly decrease the maximum risk without significantly 
worsening the median risk. The optimized value for / strikes 
a balance between the need for conditioning to stabilize in- 
version and the desire not to distort the relative impact of di- 
agonal and off-diagonal covariance matrix elements, which 
would lead to inappropriate weighting of different data points 
in calculating x^- 

Figure [3] shows a contour plot of the median values of A- 
A,rue VS. C-Cirue amongst all pick-4 measurements for each 
of the 10"* runs using the optimized conditioning (/ = 3%). In 
MN 1 0, although we had measured the correlation function out 
to a separation 0~1.584°,we only fit over the range . 00 1 ° < 
6* < 0.1°. In that case, fitting over this smaller range reduced 
the error in A, and thus improved the reconstruction. When 
using the full covariance matrix for the fit we found that fitting 
over the full range of 6 yielded even smaller parameter errors, 
as seen in Figure[3] By utilizing covariance information in our 
fitting, we can robustly incorporate correlation measurements 
from larger scales which were useless (or even detrimental) 
when ignoring the covariance. 



7 



4.2. Optimizing Fits To Wp(rp) 

We used a different method to optimize the conditioning for 
the projected correlation function of the spectroscopic sample. 
As described in ^ this sample was constructed by selecting 
60% of the objects with /? < 24.1. We calculated the risk for 
the autocorrelation parameters by creating multiple samples 
where a different 60% of the objects are chosen each time, 
and comparing these to the results for a sample containing 
100% of the objects. This differs from the method used in 
^4.11 in that we are actually performing the correlation mea- 
surements using the simulations rather than generating model 
noise based on a covariance matrix calculated from the simu- 
lation. In the case of Wpp(9) it was more difficult to determine 
the true values of w{9) (required for calculating the risk) to 
significantly greater accuracy than individual measurements, 
and therefore we relied on synthetic techniques for that anal- 
ysis. Here, we have a 'truth' measurement which is much 
better than the fits resulting from any set of 60% of the bright 
objects in only four fields, so we can measure the risk robustly 
without relying on simulated measurements. When calculat- 
ing the reconstruction of <j>p{z) we measure the parameters of 
a fit to Wp{rp) in multiple redshift bins. For simplicity, in this 
section we focus on a single z-bin in the middle of the redshift 
range, 0.613 < z < 0.704; we expect similar results for the 
other redshift bins. 

To begin we generate 10"* pick-4 measurements of Wp(rp) 
from the full sample and fit each measurement to the func- 
tional form given in equation[3l employing the full covariance 
matrix calculated from the 24 fields to determine ro and 7. As 
in MNIO, we fit over the range 0.1 < rp < 10 /i"'Mpc. Since 
the covariance matrix calculated from the full sample should 
be more stable than for the 60% subsets due to its smaller 
noise, we initially performed the fits with zero conditioning 
and used that as our 'truth' . The median values of the parame- 
ter measurements for the full sample amongst the 24 different 
fields were used as estimates of the true parameter values. We 
then calculate the risk on ro and 7 by performing 100 runs, 
where for each run we: 

1 . Constructed samples from each of the 24 mock fields by 
randomly selecting 60% of the objects with R < 24. 1 

2. Generated 10^ pick-4 measurements, randomly select- 
ing four fields at a time from the 24 and calculating their 
mean Wp{rp) 

3. Fit each pick-4 measurement for ro and 7 using the co- 
variance matrix calculated from the Wpifp) values mea- 
sured using the 24 samples constructed in step iQ 

4. Calculated the mean fractional risk on both param- 
eters, i.e. 7l(ro) = ((ro - ro,,™^)^)/'-,^,,™^ and 7^(7) = 
{il~ltrue)^) hmie^ o^^^ '^c lO'* pick-4 measurements. 

In step 3 we calculate the covariance matrix from 24 fields, 
which is more fields than we would actually have if we were 
to do cross-correlation reconstruction with current datasets at 
z ^ 1 . However, it is likely comparable to the level to which 
we should be able to determine the covariance matrix using 

' In MNIO, we corrected Wp(rp) for the fact that i,ss(rp,'iT) is not in ac- 
tuaUty measured to infinite line-of-sight separation. This was not done for 
tliis test, as the correction will affect the parameters of the full sample and its 
subsets in a similar way, so any trends in the risk should not be affected. This 
saved significant calculation time. 
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Fig. 4. — The square root of the fractional median risk (eiTor bars) and 
maximum risk (dashed line) on roj, (upper curves) and 7,, (lower curves) 
as a function of the degree of conditioning used for 100 runs, where 60% of 
objec ts with R < 24.1 were selected at random for each run, as described in 
84.21 The conditioning has a much larger effect on the maximum risk for 
both parameters, and we therefore use a minimax optimization, i.e. /=3.5%. 

current-generation deep mock catalogs, particularly since fit 
results will be sensitive to the relative values of covariance 
matrix elements, but not their absolute normalization. For 
each run we calculate the fractional risk on both parameters 
for varying levels of conditioning. 

FigureUshows the square root of the median and maximum 
fractional risk on ro and 7 amongst the 100 runs as a function 
of the conditioning. For both parameters we see a slight dip 
in the median risk over the 100 runs at / ^ 0.5%, but this rep- 
resents only a minimal improvement. Once again we see the 
conditioning has a much more significant impact on the max- 
imum risk. We optimize our fits by choosing the conditioning 
value that minimizes the maximum risk (/ ^ 3.5%). 

4.3. Optimizing (ppiz) Reconstruction 

After optimizing the fits for the autocorrelation measure- 
ments, we then looked at how conditioning the cross-correla- 
tion covariance matrices affects the overall reconstruction of 
(t)p{z)- Since the uncertainty in 4)p{z) is dominated by the un- 
certainty in Wsp{6,z), this conditioning should have the great- 
est impact on the reconstruction. We generate 10"^ pick-4 mea- 
surements by averaging the correlation measurements from 
four randomly selected fields out of the 24, which we then 
use to calculate 4)p{z)- For calculating the risk, we know the 
true redshift distribution in each field perfectly from the sim- 
ulation, so we do not need to rely on synthetic techniques as 
in ^4.11 Since the fits for both WppiO) and Wpifi,) were bes t 
with a few percent ridge regression conditioning 04.11 ^4.21 ). 
for simplicity we adopt /=3.5% as the optimal conditioning 
in both cases. 

For each pick-4 measurement, we determine the autocor- 
relation parameters of the photometric sample by fitting the 
Wpp{9) from the selected 4 fields using the optimally condi- 
tioned covariance matrix calculated from the 24 fields. All 
three parameters (App, jpp, and Cpp) are left free and fit si- 
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multaneously. To measure the evolution of the correlation 
function parameters of the spectroscopic sample, we calcu- 
lated Wp{rp) in 10 z-bins covering the range 0.11 < z < 1.4, 
where the size and location of each z-bin was selected such 
that there were approximately the same number of objects in 
each one. In each z-bin we calculate the covariance matrix 
from the 24 fields and fit each pick-4 measurement using the 
optimal conditioning to determine ro^ssiz) and 7jj(z). 

In one redshift bin (0.11 < z < 0.268), the values of ro,.?., 
and obtained with these methods were significantly dif- 
ferent from the values determined when assuming no covari- 
ance. We investigated the likelihood contours in detail and 
found they were not well behaved; not only were the me- 
dian parameter values different from the result with no co- 
variance, the standard deviation of the 10"^ pick-4 measure- 
ments proved to be an underestimate of the uncertainty in that 
bin, which had significant effects when performing an error- 
weighted linear fit to ro^ss(z) and -fssiz)- We attempted a variety 
of methods for estimating the errors in that bin with poor re- 
sults. However, we found that fitting over the shorter range 
0.25 < Tp < 10 /!"'Mpc, rather than 0.1 < r,, < 10 /i"'Mpc, 
gave more well behaved values (more consistent with the val- 
ues in other redshift bins or those obtained when ignoring co- 
variance) and improved the reconstruction. For consistency 
we fit over this range for all bins where z < 0.8. As in MNIO 
we continue to fit over the range 1.0 < < 10 /i"'Mpc for 
z > 0.8, as in the Millennium simulations (though less so in 
real datasets) Wp(rp) diverges significantly from a power law 
at 0.1 < rp < 1 /i"'Mpc. 

While the conditioning of the fits for the autocorrelation 
parameters was kept the same for each measurement, we var- 
ied the conditioning of the cross-correlation fits to see how it 
affects the reconstruction. We bin the spectroscopic sample 
over the range 0.19 < z < 1.39 with a bin size of Az = 0.04 
and measure Wsp{d) in each bin. At each level of conditioning 
we: 

1. Calculated the covariance matrix of Wsp{0) in each red- 
shift bin from the 24 fields and apply the ridge regres- 
sion conditioning to each matrix 

2. Generated lO'* pick-4 measurements, randomly select- 
ing four fields at a time from the 24 and calculating their 
mean Wsp(9,z) 

3. In each z-bin, fit the pick-4 measurements for Aj^ and 
Csp, fixing as described in ^3.21 using the covariance 
matrices calculated in step 1 

4. Combined Asp(z) and the optimized autocorrelation pa- 
rameters for each pick-4 measurement to calculate the 
probability distribution function, (ppiz), applying equa- 
tion [T] 

5. For each pick-4 measurement, we calculated the mean 
risk on (f>p(z), TZ{(j)p(z)) = {{(lip{z)-4>pjn,e{z)f), over the 
range 0.4 < z < 1 .2. This was done in two ways: 

(a) Using the overall mean 4)p{z) of the 24 fields as 

<Pp.tnie{z) 

(b) Using the mean 0p(z) from the particular 4 fields 
used in a given measurement as (pp.trueiz) 

6. Calculated the mean Tl(<j)p(z)) over the 10'* pick-4 mea- 
surements for both types of risk 
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Fig. 5. — The square root of the mean risk over the range 0.4 < z < 1-2 
for the reconstruction as a function of the degree of conditioning applied to 
the covariance matrix of Wsp(8) in each redshift bin. The solid line is the 
risk compared to the overall mean of the 24 fields, and the star symbol is the 
corresponding risk using the methods of MNIO. The dashed red(gray) line 
is the risk defined from comparing each measurement to the mean redshift 
distribution of the particular 4 fields used, and the red(gray) diamond symbol 
is the corresponding risk using the previous method. Both are at or near their 
minimum value at a conditioning of a few percent. The decrease in the risk 
when comparing to the overall mean is much greater, though improvements 
are significant regardless of the measure used. 

In step 5, we calculate the risk over a slightly limited redshift 
range to eliminate bins where noise dominates the measure- 
ments, which diluted our ability to assess the impact of ridge 
regression. 

Figure |5] shows both mean risks as a function of the con- 
ditioning, compared to the risk using methods identical to 
MN 10. We optimized for the mean risk over the redshift range 
rather than the maximum risk as the latter was dominated by 
random outliers (due to the smaller number of objects in the 
redshift bins used, errors in Wjp(0,z) are much larger, and 
hence random excursions extend further, than for the auto- 
correlations). Both techniques indicate that the minimum risk 
is obtained at around a few percent conditioning. There is a 
substantial improvement in both measures, but particularly in 
the risk comparing the redshift distribution for the four cho- 
sen fields to the overall (e.g. universal) mean. Figure|6]shows 
the reconstruction for 3.5% conditioning (i.e. the same for all 
three fits) as well as the variance and bias, and compares to 
the reconstruction using methods identical to MNIO. The de- 
crease in the variance is significant in each redshift bin while 
the bias is relatively unchanged in all but a few z-bins. By 
incorporating full covariance information and ridge regres- 
sion methods, the square root of the fractional risk is < 40% 
smaller than that resulting from our prior methods. 

5. SUMMARY AND CONCLUSION 

In this paper we hav e improved on t he cross-correlation 
techniques presented in [Matthews & New man (2010) by in- 
corporating full covariance information. In addition, we have 
demonstrated the improvements that result from incorporat- 
ing ridge regression in fitting for correlation function parame- 
ters. Conditioning using ridge regression allowed us to obtain 
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Fig. 6. — The reconstruction of (ppiz) using 3.5% conditioning for fits to all three correlation measurements, (i.e. Wpp(9), Wp(rp), Wsp(6.z)). In the top panel, 
the solid line is the mean true distribution of the 24 fields, the star symbols are the median values of the 10'' pick-4 measurements obtained using the methods of 
MNIO, and the diamonds are the median values for the optimized reconstruction using the full covariance matrix for the fits (with error bars). The middle panel 
compares the standard deviation of the 10'' pick-4 measurements in each bin using the methods from MNIO (solid line) to the improved reconstruction (dashed 
line), while the bottom panel compares the bias. The errors are significantly smaller in each bin, while the bias is comparable when full covariance information 
is used. These results are not significantly changed for moderate changes in /. 



a more stable inversion of the covariance matrix by reducing 
the impact of noise in the off diagonal elements, resulting in 
better estimates of the correlation function parameter values; 
results were significantly better than with other commonly- 
used methods such as zeroing out small singular values in a 
singular value decomposition of the covariance matrix. We 
analyzed how this conditioning affected the integrated mean 
squared error, i.e. the risk, for these parameter measurements, 
and in doing so optimized the cross-correlation technique for 
recovering the redshift distribution of a photometric sample 
with unknown redshifts. We also found that we gain signif- 
icant improvement in the reconstruction by adding a step to 
the recipe described in MNIO: we now perform a smooth fit 
for the amplitude of the integral constraint of the cross-corre- 
lation measurements as a function of redshift, Csp(z). We then 
refit for the amphtude of the cross-correlation, Agp, with Csp 
fixed at the smooth fit value in each z-bin. 



We tested the effect of the ridge regression technique on the 
calculation of parameter values for both w(6) and Wpii-p) and 
found that it had a much more significant impact on the max- 
imum risk found over multiple runs than on the median risk. 
In other words, it yields a great improvement in the worst- 
case errors, but smaller improvements in more typical cases. 
For w(6) the square root of the maximum fractional risk in 
the amplitude. A, for fixed 7 decreased by ^ 35% on average 
at a few percent conditioning. For Wp(rp) we found a simi- 
lar decrease for ro..?., 29%), while the decrease for 7,5 was 
somewhat smaller (~ 20%)-although still significant. After 
implementing the changes described above to the recipe de- 
scribed in MNIO we found that adding just a few percent of 
the ridge regression conditioning to each covariance matrix 
used in the calculation resulted in a significant improvement 
in the cross-correlation reconstruction. When conditioning all 
covariance matrices at the level of 3.5% there was ~ 42% de- 
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crease in the mean of the square root of the risk on the recov- 
ered (j)p(z) compared to the overall (i.e. universal) mean 4>p(z), 
and ^ 16% decrease when comparing the recovered (j>p(z) to 
the mean of the actual (t)p{z) for the particular four fields used 
in the measurement. 

In this paper, as well as in MNIO, we utilized an artificial 
selection for objects in a photometric redshift bin that con- 
sisted simply of selecting objects with a Gaussian probability 
centered at a mean redshift. However, in real samples, pho- 
tometric redshift errors should depend on galaxy type (and 
hence biasing) and will not necessarily lead to Gaussian selec- 
tion probabilities or uniform evolution of bias with true red- 
shift within the selected sample. Therefore in a future paper, 
we will test this improved cross-correlation technique using 



these same mock catalogs but instead of assuming a redshift 
distribution, we will use simulated photometry to measure 
photometric redshifts and try to recover the redshift distribu- 
tion of various photo-z selected bins. The following paper in 
this series will apply these techniques to test photometric red- 
shifts for galaxies in the CANDELS multiw avelength survey 
jGrogin et al.ll201 ll iKoekemoer et al.ll201 Ih . 

This paper has benefited from helpful discussions with 
Larry Wasserman, Chris Genovese, Peter Freeman, Chad 
Schafer, Ann Lee, Nikhil Padmanabhan and Michael Wood- 
Vasey. It was supported by the United States Department of 
Energy Early Career program via grant ide-sc0 003960 



APPENDIX 
POWERFITCODE 

In the course of this analysis we developed a short IDL function designed to fit for the parameters of a power-law plus constant 
model using full covariance information, with or without conditioning of the covariance matrix. Given arrays containing the 
independent variable values x, the dependent variable values y, and the covariance matrix of the y values, C, it determines the 
best-fit parameters for a function of the form y = ax^ + c via minimization (cf. Equation |4]i. It outputs the best-fit parameter 
values in the form of a three-element array, i.e. [a,b,c]. POWERFIT calculates the fit parameters as described in ^3.31 If the 
exponent, b, is fixed, the best-fit values of a and c are calculated analytically using standard linear regression formulae. To fit 
for all three parameters simultaneously, POWERFIT instead uses the AMOEBA function (distributed with IDL, and based on the 
routine amoeba described in Numerical Recipes in C (iPress et al.lll992h ) to search for the exponent value that minimizes the 
of the fit. 

POWERFIT optionally allows the user to fix either the exponent value, b, the constant, c, or both, at specified values when 
calculating the fit. It is also possible to condition the covariance matrix using either of the methods described in 33.3.11 For ridge 
regression conditioning, the user must provide a value for /, the fraction of the median of the diagonal elements of the covariance 
matrix to add to the diagonal elements before inverting. For S VD conditioning, the required input is the singular value threshold; 
any singular values below that threshold, as well as their inverses, are set equal to zero before calculating the inversion. The code 
is suitable for any application where a power law or power law plus constant model is fit to data with a known covariance matrix; 
it can be downloaded at h ttp : / / www . phyast .pitt .edu/~ j anewman /power flt| 
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